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A simple adaptive estimator of the integrated 
square of a density 
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Given an i.i.d. sample X i , . . . , X n with common bounded density /o belonging to a Sobolev space 
of order a over the real line, estimation of the quadratic functional J fo(x) dx is considered. It 
is shown that the simplest kernel-based plug-in estimator 

2 K ( Xj- Xj \ 

n(n - l)h n ^ \ h n J 

1 <«<J <7l 

is asymptotically efficient if a > 1/4 and rate-optimal if a < 1/4. A data-driven rule to choose 
the bandwidth h„ is then proposed, which does not depend on prior knowledge of a, so that the 
corresponding estimator is rate-adaptive for a < 1/4 and asymptotically efficient if a > 1/4. 

Keywords: adaptive estimation; kernel density estimator; quadratic functional 

1. Introduction 

The estimation of a quadratic functional of a density /o, in particular of J/o, has at- 
tracted much interest in the literature since Bickel and Ritov (1988) showed that such 
functionals can be estimated at the rate 1/y^i if /o is a-H61der of order a > 1/4 and 
that this rate cannot be achieved if a < 1/4. Such functionals have several statistical 
applications. For instance, J /q occurs in Taylor expansions of more complex integral 
functionals, such as the entropy / io log /o ; see, for example, Laurent (1996). They are 
also part of constants appearing in the exact expression of the MISE of kernel density 
estimators and hence their estimates can be used in optimal bandwidth selection. Bickel 
and Ritov (1988) constructed an efficient and -y/n-consistent kernel-based estimator for 
J /o and Laurent (1996) achieved the same for an estimator based on orthogonal series. 
The treatment of the bias term by these authors necessitated rather complicated expres- 
sions for the actual estimators, which consist of the difference of two U-statistics. As 
a first goal of this article, we show that the simplest 'plug-in' kernel density estimator 
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introduced in Hall and Matron (1987), 



T n {h n ) ■ 



n(n — l)h 



2 



n 



l<i<j<n 



( 




) 



(1) 



where Xi are i.i.d. with common density fo on the real line, also does the job. (T n 
is obtained as follows. Estimate fo by the usual kernel density estimator and estimate 
integration with respect to fo by integration with respect to the empirical measure, 
then delete the diagonal terms.) Our main point here consists of an observation on the 
bias term based on smoothing properties of convolutions, borrowed in part from Gine 
and Nickl (2007). We note that in the context-of-goodness of fit tests, Butucea (2007) 
considered a different (but related) kernel-based estimator for J fo, where K(x) must 
equal sin(a;)/7Ta;. Our results also hold for her estimator, without any (other than the 
usual) restrictions on the kernel; see Remark 1 below. The same methods can be applied, 
with the natural changes, to other quadratic functionals, such as J {f^Y for k > 0. 

As is well known, efficient estimation of J f§ is possible if fo is in a Sobolev space of 
order a > 1/4, but in the 'low regularity case' a < 1/4, the best rate of convergence is 
n -4a/(4a+i)^ ^ e s j 10w ^hat T n (h n ) achieves this rate if one chooses the bandwidth h n 
of the right order, where h n depends on the unknown quantity a. It is then natural to 
ask whether one can choose the bandwidth in some data-dependent way, so as to obtain 
an estimator of J /g which is rate-adaptive over Sobolev balls if a < 1/4 and efficient if 
a > 1/4. Using Lepski's method (Lepski (1990), Lepski and Spokoiny (1997)), we show 
that this is in fact possible for the simple estimator T n (h n ). Rate-adaptive estimation 
of J /q was first considered by Efromovich and Low (1996), and more recently by, for 
example, Laurent and Massart (2000), Laurent (2005), Cai and Low (2006) and Klcmela, 
(2006). None of these authors used kernel-based estimators, and, except for Laurent 
(2005), all of them worked in the context of a Gaussian white noise model. Since we are 
interested in the low-regularity case where a < 1/4, the restriction to the Gaussian white 
noise model is inconvenient, as it is not clear how asymptotic results in the Gaussian 
white noise model translate into the usual density model in this case. It turns out that 
deriving our results in the more general density model on the real line leads to no major 
complications. Our derivations rely on elementary U-statistic theory, some simple Fourier 
analytical methods and a recent exponential inequality for canonical U-statistics of order 
2 due to Gine, Latala and Zinn (2000), with constants obtained in Houdre and Reynaud- 
Bouret (2003). A discussion of the relationship of our results to those in Laurent (2005) 
is given in Remark 5 below. 

2. Basic setup 

We will assume that the probability density fo is bounded, that is, fo € L°° := L°°(M), 
and contained in a Sobolev space of order a > 0, defined as follows. First, denote by L p := 
L P (M, A) the usual spaces of measurable functions <f> satisfying \\<fi\\P := J R \(f>(x)\ p dx < oo 
for 1 < p < oo. For <j> £ L 1 , we define the Fourier-transform by Fcf>(u) = L e~ lxu (j)(x) dx 
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and extend it continuously to L 2 . (F is, up to a multiplicative constant, the Fourier- 
Plancherel transform.) We then set 

H? = H«(R) := {4><E L 2 : ||0|| 3 , Q = ||F0(-)(1 + | ■ | 2 )" /2 || 2 < oo}. 

We note that a common equivalent characterization of H 2 is in terms of integrated 
L 2 -H61der conditions: for <fi <G L 2 and < a < 1, define 

JrJr \ t \ 

It can then be shown that <fi £ EL 2 if and only if (f> G L 2 and I a {4>) < oo (cf. page 144 in 
Malliavin (1995)). Throughout the proofs, we will freely use these and other standard 
facts from Fourier analysis, as well as Young's inequalities for convolutions. We refer, for 
example, to Chapter III in Malliavin (1995) or Chapter 8 in Folland (1999). Also, unless 
otherwise indicated, all integrals in this article will be over the real line. 

It is also convenient to introduce U-statistic notation. For a symmetric function of two 
variables R(x,y), we write 

v ' l<i<i<n 

We recall (e.g., de la Pena and Gine (1999), page 137) that the Hoeffding projections of 
R are 

K!R(x) = ER(x,Xx) - ER(X 1 ,X 2 ), 
n 2 (R)(x,y) = R{x,y) - ER(x,X 1 ) - ER{y,X{) + ER{X 1 ,X 2 ), 

which induce the Hoeffding decomposition 

UW(R) - ER(X U X 2 ) = 2£/X 1) (7n-R) + U^faR), (2) 

where Un(mR) = rC 1 X)™=i( 7ri ^)(^*)- Note that, by orthogonality, 
E{V^\^R)f =n- 1 E((7T 1 R)(X 1 )) 2 , 
E(U 7 [ 2 \n 2 R)f = -^-^E{{-K 2 R){X u X 2 )f. 



3. Estimation of J K (cc) dec 

The simple estimator T n (h n ) defined in (1) can be shown to be optimal, as we prove in 
this section. 

Here and elsewhere in this article, we take the kernel K in (1) to be a symmetric and 
bounded function such that / K(u) du = 1, as well as / \K (u)\ \u\ du < oo and < h n — ► 0. 
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For ease of notation, we will often write Kf ln (x) := h n 1 K(x/h n ). Also, we define the 
Sobolev ball H a (R) = {<j> : \\<j>h, a < R} and B(L) = {<f>:\\<p\\oo < L}. 



Theorem 1. Let f e H£ n L°° for some < a < 1/2. 
I. We have 



sup 

fo£H a (R) 



ET n {h n )- / f%{x)dx 



<B{h n ) := Cl {R)hl 



(3) 



and 



sup E\T n {h n )-ET n {h n )--Y j Y i ] < c 2 (R)a 2 (h n ,n) 



4(R) 



1 Lh 



2a 



n 2 h Tl 



(4) 



where Yi = 2(/o(Xj) — L/o) a?l ^ where C\(R) and C2(R) are numerical constants de- 
pending only on R and the function K . 

II. As a consequence, taking h n so that h n ps r7, _2 /( 4Q + 1 ) ; /iawe i/ie following: 

(a) i/ < a < 1/4, i/ien 

T n {h n ) — / / (x) 2 d.x = Op(n- 4tt /( 4 "+ 1 )); 

(b) i/a > 1/4, and z/r 2 = [/„ / 3 - (J R / 2 ) 2 ], &en 

V^(T n {h n )- [ f Q (x) 2 dx] -^ d Z~N(0,4T 2 ). 



Proof. We first treat the bias term, where we adapt an observation due to Gine and 
Nickl (2007), Section 4.1.1, to the present situation. The bias equals 

ET n (h n )- [ fa= [ [ K hn (x-y)f (y)dyf (x)dx- [ f (x)f (x)dx 



Khjx ~ y)[fo(y) - fo(x)]fo(x) iydx 



K(u)[f (x- uh n ) - f (x)]fo(x)dudx 



(3) 



K(u) 



fo(uh n — x)fo(x)dx- / / o (0-x)/o(a;)da; 



du 



K(u)[(f * fo)(uh n ) - (f * / )(0)] du, 
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where fo(x) = fo(—x) and * denotes convolution. The essential observation now is that 
the smoothness of f * fo will be of order 2a instead of just a, due to the smoothing 
properties of convolutions. The following elementary Fourier analytic lemma shows how 
this applies in our setup. 

Lemma 1. Suppose that f,g£ H% with < a < 1/2. Then, for any iel and ty^O, 
\(f*g)(x + t)-(f*g)(x)\ 

up < C||/ || 2 ,a|Mka, 

where < C < oo is a fixed constant that does not depend on f,g,x or t. 

Proof. As we will only use this lemma for f,g € L 1 , and in order to avoid some tech- 
nicalities, we will prove it only in this case. Hence, / * g is in L 1 and is continuous and, 
since f,g are also in L 2 , we also have F(f * g) £ L 1 . Consequently, we can apply the 
Fourier inversion theorem to obtain 

l(/ * g)(x+ l ; l ) 2 : (/ * g)(x)l < \t\-^\F-^ [{ f * + D - u * 9)( .)]iu 

<(2n)-'\t\-^\\F[(f,g)(- + t)-(f,g)(-)}\\ 1 
= (27t)- 1 |tr 2Q / \F(f*g)(u)[e- iut -l]\du 

r \p-iut _ p-iO| 

=(2n) jL |F/IH " |Fg|H \unt\*<* du 

<C||/|| 2 , Q ||ff|| 2 ,„ 

since e -1 ^ is bounded Lipschitz. □ 
This lemma and identity (5) now give, by the conditions on the kernel, that 

ET n (h n )~ J f 2 



where a = <7||/o||2 )Q J \K(u)\\u\ 2a du < CR 2 J \K{u)\\u\ 2a du, that is, (3). 
Next, we show (4). Setting 

R(u,v) :=K hn (u-v), 

(2) ~ 

we can write, in U-statistic notation, T n (h n ) = Un (R) or, if R(u,v) — R(u,v) 
ER(Xi,X2), 

T n (h n )-ET n (h n ) = u!£\R). 
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So, by Hoeffding's decomposition (2), it remains to estimate the following statistics (note 
that iTiR = TtiR, i = 1,2): 

1 " / 1 " \ 

U£\R) - -VVi = 2U^(mR) --Yji +uW(n 2 R) =: Si +5 2 . 

i=l \ i=l / 

First, we have, by Planchcrcl, 



nES 2 < E 



2K hn (X 1 -y)f (y)dy-2f (X 1 ) 



<4||/ ||oo||^„*/o-/o||i 

I/oIUIKfa^-i)!-!-^! 



< 



2tt 

4 



nil 



2tt 
4 



h 2 a( \FK(h n u)-FK(0)\ \ 2 

l/oHoo /!■„ Sup — ||/0|| 2 ,a 



/o||oo /l„"sup 



|li/l„|» 



AT(a;)dx 



l/ol 



(6) 



< 



l/ol 



|A(x)|N Q dx H/olllX' 



Next, since 7r 2 is a projection of L 2 (f (x)dx), it follows from Young's inequalities that 



ESl < —?—ER 2 = 2 E[K hn (X! - X 2 
n(n — 1) n{n — 1) 

r (Kt*f )(y)f (y)dy 



< 



n(n — 1) 

2||/o||ira 

n(n — l)h n 



(7) 



Now, (6) and (7) complete the proof of (4). The remaining claims in Part II follow by 
the choice of the bandwidth and, in case (a) (and hence a < 1/4), noting that we have 
"^ELi^ = P (n- l/2 ) = Op(n- 4a / {4a+ ^) and, in case (b), from the central limit 
theorem for the random variables Yj. □ 



Without loss of generality, we restricted ourselves to < a < 1/2 in Theorem 1. It is 
obvious that Part II holds for all a > and it can be seen that Part I does too, although 
this is not of interest here. 
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Remark 1. A second plug-in estimator of J /q is 

2 



T n (h n ) 



n(n — 1) 



X hn (a; - Xi)K hn (x - X,-) da;, 



obtained by integrating the square of the usual kernel density estimator and deleting the 
diagonal terms. Although Theorem 1 could also be proved for this estimator, we choose 
to work with T n (h n ) because it is simpler to compute. The results of Theorem 1 for T n 
can be derived by similar computations. Here, we briefly consider the bias, which is really 
the main part, by relating it to the bias of T n (as in Bickcl and Ritov (1988)): using (3) 
and (6), we have 



ET n (h n ) — J /o = ET n (h n )-2ET n (h n ) + J / 2 + 2\ET n {h n ) - j /, 



< 



ET n {h n )-2ET n {h n )+ / / 2 



2dh 



(K hn *fo)(x)-f Q (x)} 2 dx + 2c 1 ht 



2c 



< 



K(u)\\u\ a du) ||/o||iX Q + 2ci/i 



,2a 



Butucea (2007) also obtains such a bound, but only for the special kernel K{x) = 
sin(x)/7Tx, and it is the use of Lemma 1 that allows us to consider the case of general 
kernels. 



Remark 2. Bickel and Ritov (1988) show that if \fo(x + h) — /o(a;)| < g{x)\h\ a for some 
g G L 2 n L°°, xeR, \h\<l and a > 1/4, then 

V^(2T n (h n ) -T n (h n ) - J f^j -^ d Z~N{0,4T 2 ) 

with h„ = n _2 /^ 4Q+1 ^ (actually they consider a 'decoupled' version). Clearly, any such 
fo is contained in for all (3 < a and this implies, by Theorem 1, that the simpler 
estimator T n (h n ) satisfies the same central limit theorem. (As a matter of fact, Lemma 1 
and hence Theorem 1 also holds for such /n , even without requiring g G L°° ; see Lemma 
12 (and the discussion following it) in Gine and Nickl (2007). However, the proofs there 
are much more technical, which is why we prefer to work with Sobolev spaces here.) 



4. Adaptive estimation of / R (cc) dec 

In Theorem 1, one must know a in order to choose h n in an optimal way, h n ranging 
between ?i~ 2 and 1. We will now use T n {h n ) to construct a kernel-based rate-adaptive 
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estimator of J /g that requires only that J /q is bounded by a known constant L and 
that fo is a bounded function contained in H2 for some (unknown) a > 0. In practice, 
one can restrict oneself to0<a<l/4 + e with e positive and arbitrary since the rate of 
convergence n -1 / 2 in part (b) of Theorem 1 could not be improved if one knew that a 
were larger. In particular, it suffices to consider bandwidths that are faster than n~ 1+s 
for some arbitrarily small 5. 

In what follows, we borrow in part from methods developed by Lepski and Spokoiny 
(1997) for kernel-based pointwisc adaptive estimation in the Gaussian white noise model. 
Our situation, however, is substantially different in several respects. For instance, there is 
a critical breakpoint in convergence rates at a = 1/4 and we do not have the convenience 
of immediate Gaussian tail inequalities. 

For any given n € N, n > 1, wc define a grid of bandwidths 



H:=ihe 



(logn) 4 1 



1 logn £(n) hk 

■ ho= , h\ = , /12 = , hk+i = — ,k = 2,3,. 

n 1 n n p 

where p > 1 and £(n) is any function such that £(n) — > and £(n) logrt — > 00 as n — > 00, 
and £(n) < logn for all n. (In particular, £(n) can be chosen to tend to zero as slowly 
as desired.) It is easy to check that the number of elements in this grid is smaller than 
3 + (log n) / (log p) = O(logn) and wc shall use this estimate below. Next, we define the 
function d(h) for all h £ [n~ 2 (logn) 4 , n~ 1+s ] as 



d(h) = W 2Mlogy for/K/12 and d(h) = £(n)~ 1/2 for h > h > h 2 , 

where M := \2 2 \\K\\\L and where we recall that L is a bound on J Jq- We also set 
a(h,n) = n^hr 1 ! 2 . The bandwidth estimator is defined as 

h„ = max{h S H : \T n {h) - T n (g) \ < a(g, n)d(g) Mg <h,g£H}. 



Remark 3. If h equals the next to last element in the grid Ji and g is the last, then 
a(g,ri)d(g) is of the order (logn)- 3 / 2 , whereas \T n (h) - T n {g)\ = P ((logn)- 2 )), by The- 
orem 1. Hence, h n exists with probability tending to 1 as n — > 00. In the next theorem, 
expectations that involve events based on h n should be understood as taken over the 
event {h n exists}. 

Remark 4- In cases (a) and (b) in Theorem 2 below, the rates of convergence obtained 
are, in fact, slightly slower than those in Theorem 1. This is not surprising, as Efromovich 
and Low (1996) showed that one must pay exactly these penalties if one wants to estimate 
J /o adaptively. 

Remark 5. Laurent (2005) considered adaptive estimation of J R /q by model selection. 
Her results are comparable to our Theorem 2 below. (She considers fo contained in the 
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Besov space x which is slightly more general in view of the imbeddings 5^+ £ C H% C 
oo f° r every e > 0.) In her Theorem 1, she assumes that an a priori bound for ||/o||oo 
is known. Similarly, we have the assumption of a known upper bound L for J f§. In her 
Theorem 2, Laurent (2005) proposes a remedy for this problem by estimating this upper 
bound. Similarly, wc could estimate the upper bound L by T n (h m i n ) to achieve the same 
goal. 



Theorem 2. Let fg £ H% n L°° for some a > 0. 
(a) // < a < 1 /4, i/ien 



T n (h n )- I f$(x)dx = P 



/ /i \ 4a/(4a+l) 

/ VlognX 



V n 



T n {K)- / / 2 (x)dx = O P (n- 1 / 2 £(n)- 1 ). 



(b) //a = 1/4, the 



(c) 7/ a > 1/4 and r 2 = [/ R J 3 - (/ R / 2 ) 2 ] , then 



V^\T n (h n ) - / 2 (x) Ax J ^ d Z ~ iV(0,4r 2 ). 

Proof. We first observe that a(h,n) = a(h,n) whenever h<hi, which will always be 
the case in this proof. Define hf(=hf ) as hi if a > 1/4, as ft. 2 if a = 1/4 and, otherwise, 

hf = max{/i £ H:cih 2a < \a{h,n)d{h), h < h 2 }. 

It is easily checked that hf exists and is of the order of (n/-y/logn)~ 2 /( 4Q+1 ) if a < 1/4. 
By construction in case a < 1/4 and by straightforward computations in the other two 
cases, we have, for n large enough, 



B(h f )<la(h f ,n)d(hf). 
We estimate the expectation of 



(8) 



r 1 " 

Tn{h n )- / / 2 ( I )d I --Vy j 

over each of the two events {h n >hf} and {h n < hf}. In the first case, we have 

f 1 - 

T n {h n )- / / 2 (a;)dx--Vy i 



I[h n >h f ] 
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<E 



\T n (h n )-T n (hf)\ + 



1 n 

T n (h f )-ET n (h f )--^2Yi 



ET n (h f )- / ft 



I [h n >h } ] 



< a{hf ,n)d(hf) + C2(j{hf,n) + B(hf) 
= 0(a(h f ,n)d(h f )), 

where we use the definition of h n , Theorem 1 and (8). In the other case, where {h n < hf}, 
we will rely on the following lemma, which will be proved below. 

Lemma 2. Let h EH and h < hf . There exists a constant D < oo so that, for all n large 
enough, if h < hi, then, 

Pr(/i„ =h)< D(\ogn)cxp(-d 2 (h)/M) 

and if h = h%, then 

Pr{h n = hi) < D[exp(-d 2 (h 2 )/M) + (logn)cxp(-d 2 (h 3 )/M)}. 

This lemma, Theorem 1, (8), the size of the grid and the definition of d(h) now give, 
for a < 1/4 and hence for hf <h 2 , 



E 



I [h n <h f ] 



[ T n {h n )- f f 2 {x)dx--Y^Y l 
= J2 E[T n (h)-[ f 2 (x) ds - ± f> I [&n=J A 

heH:h<h f \ JR i=l / 

/ T 1 " 

< J2 E [ T n (h) - ET„(h) - - 

heH:h<h f \L i=l 

< \ET n {h)-ET n {h)--Y j Y i ) 

heH : h<h f \ i=l / 



(9) 



ET n (h) - / f 2 



[h n =h] 



(Pr(ft„ = h)) 1/2 + B{h } ) 



^D^logn) 1 / 2 ™-* h-^ 2 h+ja(h fl n)d{h f ) 



hGH: h<h f 



< D'n- s {\ognf /2 h) /2 + -o-(hf,n)d(h f ) 
j 4 

= Z n (a) + 0(o-(hf,n)d(hf)), 
where D' is an absolute constant and where Z n (a) = o(n~ x / 2 ) if a > 1/4 and 



Z n (a)=o 



4a/(4a+l) 



Adaptive estimator 



57 



otherwise (as can easily be seen from the definition of hf). If a > 1/4 (and hence hf = hi), 
then one must add the term 



[^^(/nJ-STn^)--^^ J (P 



r(h n = h 2 )) 



1/2 



<D 1 / 2 (^(n))- 1 / 2 [exp(-d 2 (^ 2 )/A/) + (log7i)exp(- ( i 2 (^3)/M)] 1 / 2 
^o(n~^ 2 ) 

in the sum over h < hf in the line before (9), hence yielding the same result. 
Summarizing these findings, we conclude that 



E 



r i " 

T n {K)- / / 2 ( a; )dx--Vr l 



0(a(/i / ,n)d(/i / )) + Z n (a)+o(n- 1 / 2 ). (10) 



By definition of hf, it follows that, if a > 1/4, 

a(/ l/ ,?i)d(/ l/ )«n- 1+(1/2) (logn)^ 1 / 2 (£(n))- 1 / 2 =o(7i- 1 / 2 ), 

hence giving the central limit theorem in part (c) of the theorem by (10). Similarly, for 
part (b), if a = 1/4 and hence hf = h 2 , we obtain 

a{hf,n)d(h f ) « n- 1 -^ 1 /*)^))-!/*^))-!/* = 0(n- 1 / 2 £(n)- 1 ) 
and if a < 1/4, we have 



a(h f ,n)d(hf) 
giving part (a). 



n \ ylog n 



l/(4a+l) 



= o 



4a/(4a+l) 



□ 



It hence remains to prove Lemma 2, where we will use Bernstein's inequality and an 
exponential inequality for canonical U-statistics of order 2. 

Proof of Lemma 2. Choose some h < hf, h G TC and let h+ = ph be the previous 
element in the grid. By definition of h n , we have 

Pr(h n =h)< Pr T «(<?) - T n{K)\ > a(g,n)d{g)). 

geH:g<h 

However, 

\T n (g) - T n (h+)\ < \T n (g) - ET n (g) - (T n (h + ) - ET n (h+))\ + B{g) + B(h + ), 
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where g < h < hf and also h + < hf since hf GTt. Consequently, by (8), 
B(g) + B(h + ) < 2B(h f ) < \a{h f ,n)d{h f ) < \a(g,n)d{g). 

Hence, 

Pr(h n = h)< ?<\Tn{g)-ET n {g)-{T n {h + )-ET n (h + ))\>\a{g,n)d{g)). (11) 

g£H:g<h 

For ease of notation, we set 

L(x,y) := L g (x,y) = K g (x - y) - K h+ (x - y) 

and 

C n,g-=^{g,n)d{g) = -— r j^. 

In particular, in U-statistic notation, we have 

U£\L) - EUi 2 \L) = (T n (g) - ET n {g)) - (T n (h+) - ET n {h+)) 

and, recalling the Hocffding decomposition (2), we have 

U^{L) - EU^(L) = 2U 7 [ 1 Hn 1 L) + U™foL). 

So, to estimate the right-hand side of (11), it suffices to bound 

Pr{|C/«( 7 r 1 L)|>rC„ )9 /2} and Pi{\U™ faL)\ > (1 - r)C n J 

for some < r < 1. We will apply Bernstein's inequality (e.g., de la Peha and Gine 
(1999), page 166) to the linear part (the first probability) and its generalization for 
canonical U-statistics of order 2 (Gine, Latala and Zinn (2000)) with constants (Houdre 
and Reynaud-Bouret (2003)) to the second. 

Linear term: Noting that Var(-7TiL) < E(Ex 2 L(Xi,X2)) 2 and that 



E(E X2 (K g (X 1 -X 2 ))) = (K g * f o y(y)f (y)dy <\\K\\i\\foM\\f \\ 



Var(7r 1 L)<4||^||f||/o||i||/o||oo=:r>i 



by Young's inequalities, and likewise for , we have 

Moreover, again by Young's inequalities, 

|ki£|joo<4||Al 1 ||/ || oo := J D 2 . 
Hence, Bernstein's inequality gives 



Pr(|E#>(7riL)| >rC n , g /2) < 2cxp{- — 



(2/3) J D 2 rC„, 9 /2 



Adaptive estimator 59 

Since g > n _2 (logn) 4 , C n>g — » as n —* oo and since g <h 2 = n~ 1 £(n), we have nC^ > 
d 2 (g)/£(n), where we recall that £{n) — > 0, so we obtain, for any given r, that there exist 
N T such that, for all n > N T , 

Pr(|C/-«(7Ti£)l > rC„, s /2) < 2exp{-d 2 (g)/M}. (12) 

Second-order term: We first state the inequality for canonical U-statistics that we are 
going to use (Theorem 3.4 in Houdre and Reynaud-Bouret (2003)): Let R(x,y) be a 
symmetric function of two variables such that ER{X 1 x) = for all x and let 

. , n(n — 1) , 

A 2 = 1 2 ' ER\ 

A 2 = nsup{E(R(X 1 ,X 2 )C(X 1 )aX 2 ):E( 2 (X 1 )<l,Ee(X 1 )<l}, 
A 3 = Hn^-R 2 ^!, .)||^ 2 , A 4 = H-Rlloo- 

Then, for every e > 0, there exist finite non-zero numbers 77(e), /3(e) and 7(e) such that 
the following is true for all u > and n € N: 

Pr ( ^""^ l (i?) I > 2(1 + e) 3 / 2 A 1 ^ 1 /2 + ^ A 2U + /3(e) A 3 u 3 / 2 + 7 (e) A 4 u 2 
< 6exp{— w}. 

We apply this inequality for i? = 7r2-£/ and it = d 2 (g)/M to obtain the desired bound for 
Pr(|t/A (7T2-L)| > (1 — r)C„. g ), with a small t to be chosen below. So, we need to show 
that 

2(1 + e) 3 / 2 A lU V2 + + ^ (£)A3U 3/2 + 7(e) A 4U 2 < (1 - r)^^ C„, 9 

for the specified choice of u. First, since 

ir 2 (x-2/)/o(x)/o(2/)dxd2/= /" (X 2 */o)(x)/o(x)d 2 ; 

<ll/o||^l|A^|| 1 =. 9 - 1 ||X||2!|/o||^ 
and, likewise, if g is replaced by h + > (7, we obtain 

A 2 <2n(n-l).g- 1 ||Al 2 ||/ | 12 



2- 



Taking e an d r so tha t (1 + e) 3/2 = 1.1 and 12(1 - 2r) = 11.4, it follows that, for all 
such that yjnj(n — 1) < 1.1, 

2(1 + ef' 2 ^' 2 < (1 - 2r)^ r ^C„, 9 . 



60 



E. Gine and R. Nickl 



For the second term, 

lEKK^x, - x 2 ))ax 1 nx 2 )]\ <\\K g * (C/o)|| 2 ||e/o||2 < prilill/olloo. 

Similarly 

\E[E x ,K g (X 1 - X 2 )C(X 1 )ax 2 )}\ < H^llill/olloo, 
\EK g (X 1 -X 2 )\ < ||A'j|i||/o||oo 

and also for K h + . Thus, 

E[w 2 L(X 1 ,X 2 )aX 1 )aX2)} < 8||JC||i||/o||oo 
so that A 2 < 8||A"||i||/ ||oon- This gives that 

n(e)A 2 u = o(n 2 C n , g ) 
since y/gd(g) — > as n — * oo. For the third term, we have that for every i€t, 

^|^^ 1 (7T 2 i) 2 (^ 1 ,; C )| < 47T.[||^||l|| j r || oo .g- 1 + H^lllll/olllll/olloo]. 

Then, 

which is o(n 2 C n , g ) because ^/n/d 2 (g) — > oo. As for the last term, we have A4 = 1 1 7T2 A 1 1 00 < 
411^1100/5 and hence A4U 2 < Cd A / g. which is also o(n 2 C n ,g) because d 3 (g) is of the order 
of (logn) 3 / 2 , whereas n^fg > nVtiii = (logn) 2 . We conclude that for the specified r and 
for all n large enough, 

Pr(\U 7 [ 2 HL)\ > (1- r)C n , g /2) <6eM-d 2 (g)/M}. (13) 

Inequalities (12) and (13) give 

Pr(|T„(<?) - ET n (g) - (T n (h+) - ET n (h+))\ > \cr{g,n) d(g)) < 8exp{-d 2 (.g)/A/}. 

The lemma now follows from this bound, (11), the fact that if g < h then 

cxp{-d 2 ( 5 )/M} < cyL-p{-d 2 (h)/M} 

and the definition of the grid 7i. □ 
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